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Abstract 

^ \ We tackle the issue of the blind prediction of a Gaussian time series. For this, we 

CNJ ' construct a projection operator build by plugging an empirical covariance estimation 

into a Schur complement decomposition of the projector. This operator is then used 
, to compute the predictor. Rates of convergence of the estimates are given. 
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Introduction 



In many concrete situations the statistician observes a finite path Xi, . . . ,X n of a real 
temporal phenomena which can be modeled as realizations of a stationary process X := 
(Xt)tez ( we refer, for example, to [9], [12] and references therein). 

Here we consider a second order weakly stationary process, which implies that its 
mean is constant and that ~E(X t X s ) only depends on the distance between t and s. In the 
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sequel, we will assume that the process is Gaussian, which implies that it is also strongly 
stationary, in the sense that, for any t, n G Z, 

(*!,..., X n ) = (X t+1 , • • • , X t+n ), (t G Z, n G N). 

Our aim is to predict this series when only a finite number of past values are observed. 
Moreover, we want a sharp control of the prediction error. For this, recall that, for 
Gaussian processes, the best predictor of X t ,t > 0, when observing • • • , X_i, is 

obtained by a suitable linear combination of the (Xj)j = _jv ; ...,-i. This predictor, which 
converges to the predictor onto the infinite past, depends on the unknown covariance 
of the time series. Thus, this covariance has to be estimated. Here, we are facing a 
blind filtering problem, which is a major difficulty with regards to the usual prediction 
framework. 

Kriging methods often impose a parametric model for the covariance (see [13], [3], 
[T2]). This kind of spatial prediction is close to our work. Nonparametric estimation 
may be done in a functional way (see [5], [TJ, [2]). This approach is not efficient in the 
blind framework. Here, the blind problem is bypassed using an idea of Bickel [I] for the 
estimation of the inverse of the covariance. He shows that the inverse of the empirical 
estimate of the covariance is a good choice when many samples are at hand. 

We propose in this paper a new methodology, when only a path of the process is 
observed. For this, following Comte [TU], we build an accurate estimate of the projection 
operator. Finally this estimated projector is used to build a predictor for the future values 
of the process. Asymptotic properties of these estimators are studied. 

The paper falls into the following parts. In Section [TJ definitions and technical prop- 
erties of time series are given. Section [2] is devoted to the construction of the empirical 
projection operator whose asymptotic behavior is stated in Section |3j Finally, we build a 
prediction of the future values of the process in Section |H All the proofs are gathered in 
Section EJ 

1 Notations and preliminary definitions 

In this section, we present our general frame, and recall some basic properties about time 
series, focusing on their predictions. 

Let X = (Afc)fc e z be a zero-mean Gaussian stationary process. Observing a finite past 
X_ N , • • • , X_i (N > 1) of the process, we aim at predicting the present value X without 
any knowledge on the covariance operator. 

Since X is stationary, let r^j := Cov(Xj, Xj), G Z) be the covariance between Xi 
and Xj. Here we will consider short range dependent processes, and thus we assume that 

J2 r l < 

So that there exists a measurable function /* G L 2 ([0, 2n)) defined by 

oo 

r(t) ■= E r k e ikt , (a.e.) 

k=—oo 

This function is the so-called spectral density of the time series. It is real, even and 
non negative. As X is Gaussian, the spectral density conveys all the information on the 
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process distribution. 



Define the covariance operator T of the process X, by setting 

Note that T is the Toeplitz operator associated to /*. It is usually denoted by T(f*) 
(for a thorough overview on the subject, we refer to [8]). This Hilbertian operator acts 
on / 2 (Z) as follows 

\/u G / 2 (Z), i G Z, (Tu)i := ^T ijUj = ^r^jUj = (T(/*)m)<. 

For sake of simplicity, we shall from now denote Hilbertian operators as infinite matrices. 

Recall that for any bounded Hilbertian operator A, the spectrum Sp(A) is defined as 
the set of complex numbers A such that A Id —A is not invertible (here Id stands for the 
identity on / 2 (Z)). 

The spectrum of any Toeplitz operator, associated with a bounded function, satisfies 
the following property (see, for instance [9]): 

V/ G ([0, 2tt)) , Sp(T(/)) C [min(J), max(/)] . 

Now consider the main assumption of this paper : 

Assumption 1.1. 

3m, rri > 0, Vt G [0, 2tt) , m < /*(t) < m'. 

This assumption ensures the invertibility of the covariance operator, since /* is bounded 
away from zero. As a positive definite operator, we can define its square-root T». 
Let Q be any linear operator acting on Z 2 (Z), consider the operator norm ||Q|| 2o p := 
su PMg/ 2 (z),||«|| 2 =i || 2 j an d define the warped operator norm as 



IIQIIr := SU P 

w s z 2 (z),||r2w|| =i 



T*Qu 



Note that, under Assumption (II. ip ||r|| 2 < m', hence the warped norm ||.|| r is well 
defined and equivalent to the classical one 



m ,, „,, ,, „,, m! 



m 7ll^lk oP <»<-IIQII 2 , op - 

Finally, both the covariance operator and its inverse are continuous with respect to 
the previous norms. 

The warped norm is actually the natural inducted norm over the Hilbert space 

#=(Z 2 (Z),<,.)r), 

where 

(x,y) T := x T Ty = ^ x i T ijVj- 

From now on, all the operators are defined on H. Set 

L 2 (P) := {Y G S^an((X 4 ) i6Z ) ,E[y 2 ] < +00} 
The following proposition (see for instance [9]) shows the particular interest of H : 
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Proposition 1.2. The map 



$ : 



H ^L 2 (P) 



defines a canonical isometry between H and L^P). 

The isometry will enable us to consider, in the proofs, alternatively sequences u G H 
or the corresponding random variables Y G L 2 (P). 

We will use the following notations: recall that V is the covariance operator and denote, 
for any A, B C Z, the corresponding minor (A, B) by 



AB ■'- 



(r 



Note that, when ^4 and B are finite, is the covariance matrix between (Aj) ie ^ and 
(Xj)j e B- Diagonal minors will be simply written Ta '■= Taa, for any AgZ. 

In our prediction framework, let O C Z and assume that we observe the process X at 
times i E O. It is well known that the best linear prediction of a random variable Y by 
observed variables (Aj)j e o is also the best prediction, defined by PoiX) := ^ [^|(^)«6o]- 
Using the isometry, there exist unique u E H and v £ H with V = and Po(y) = 
$(u). Hence, we can define a projection operator acting on H, by setting po{ u ) '■= v - 
This corresponds to the natural projection in H onto the set i G O}. Note 

that this projection operator may be written by block 



p u :-- 



r T c 





u. 



The operator Tq 1 is well defined since f*>m> 0. Finally, the best prediction observing 
(Xi) ie0 is 

E[Y = $(u)|(Xi) i60 ] = Po($(«)) = $(po«). 

This provides an expression of the projection when the covariance T is known. Ac- 
tually, in many practical situations, T is unknown and need to be estimated from the 
observations. Recall that we observe X_n, • ■ • , X-\. We will estimate the covariance with 
this sample and use a subset of these observations for the prediction. This last subset 

will be j(X 4 ) J60K(JV) }, with K{N) := [-K(N), ■ • • — 1]. Here (K(N)) 

NgN 1S a growing 

suitable sequence. Hence, the predictor Y will be here 



Y = P, 



o 



K{N) 



Y, 



where Po K(N) denotes some estimator of the projection operator onto Ok{n), built with 
the full sample (Xj)j=_jv,-,-i- 

As usual, we estimate the accuracy of the prediction by the quadratic error 



MSE(F) = E 



The bias-variance decomposition gives 



Y-Y 



E 



Y-Y 



E [ (Po K m Y-Po K{N) Y) 2 }+E[ (Po k(n) Y-P z -Yy]+E[(P z -Y-Y) 2 }, 
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where 

p o K(N) y = y, 

P O k( n) Y = E [ Y \( X i)ieo K(N) ] , 

and 

F L -Y = E[Y\(X i ) i<0 \. 
This error can be divided into three terms 

• The last term E[(Pz-Y — Y) 2 ^ is the prediction with infinite past error. It is 
induced by the variance of the unknown future values, and may be easily computed 
using the covariance operator. This variance does not go to zero as N tends to 
infinity. It can be seen as an additional term that does not depend on the estimation 
procedure and thus will be omitted in the error term. 

• The second term E[(p 0k{n) Y - P z -Y^ ] is a bias induced by the temporal thresh- 
old on the projector. 

• The first term E[ (^Po k(n) Y — Po K(N) Yj ] is a variance, due to the fluctuations of 
the estimation, and decreases to zero as soon as the estimator is consistent. Note 
that to compute this error, we have to handle the dependency between the prediction 
operator and the variable Y we aim to predict. 

Finally, the natural risk is obtained by removing the prediction with infinite past error: 
R(Y = Po K{N) Y) := E[ (Po K{N) Y - Po K(N) Y) 2 } +E[ (p 0k{n) Y - P z -Y) 2 ) 



E 



(y-E[y|(x 



i)i<0. 



The global risk will be computed by taking the supremum of R(Y) among of all 
random variables Y in a suitable set (growing with N). This set will be defined in the 
next section. 



2 Construction of the empirical projection operator 

Recall that the expression of the empirical unbiased covariance estimator is given by (see 
for example [3]) 

1 ~ P ~ l 

V0<p<N, ?W(p) = J2 X k X k+p . 

^ k=-N 

Notice that, when p is close to N, the estimation is hampered since we only sum N — p 
terms. Hence, we will not use the complete available data but rather use a cut-off. 

Recall that Ok(n) '■= [— K(N),— 1] denotes the indices of the subset used for the 
prediction step. We define the empirical spectral density as 

K(N) 
p=-K(N) 

We now build an estimator for Po K[N) ( see Section [TJ for the definition of Po K{N) )- 
First, we divide the index space Z into M K U Ok U B k U F k where : 
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• Mk = {•••, —K — 2, —K — 1} denotes the index of the past data that will not be 
used for the prediction (missing data) 

• Ok = —K, ■ ■ ■ , — 1 the index of the data used for the prediction (observed data) 

• Bk = 0, ■ • • ,K — 1 the index of the data we currently want to forecast (blind data) 

• Fk = K, K + 1, • • • the remaining index (future data) 

In the following, we omit the dependency on iV to alleviate the notations. 

As discussed in Section [TJ the projection operator po K may be written by blocks as: 







Since, we will apply this operator only to sequences with support in Bk, we may consider 



Vw G / 2 (Z), Supp(u) C B K ,po K B K u :-- 



( r A ) ^OkBk 











u. 



The last expression is given using the following block decomposition, if B K denotes the 
complement of Bk in Z : 

OkBk O k B k 
O c K B K O c K B K 

Hence, the two quantities To k b k and (ro K ) _1 have to be estimated. On the one hand, 
a natural estimator of the first matrix is given by Tq k b k defined as 



O k B k 



r {N \\]-i\)^eO K ,] eB K . 



as 



On the other hand, a natural way to estimate (Tq k ) 1 could be to use (f q ) (defined 

1 O k ) . . 



^ (|j — j £ Ok) and invert it. However, it is not sure that this 



matrix is invertible. So, we will consider an empirical regularized version by setting 



for a well chosen a. 
Set 



so that 



( ff)w 



Ok. 



a 
< 



— mm j K 1 



O k 



al. 



o f 



m 



2,op 

function = + a, that has been tailored to ensure that f( N ^ is always greater than 
^, yielding the desired control to compute Other regularization schemes could have 
been investigated. Nevertheless, note that adding a translation factor makes computation 
easier than using, for instance, a threshold on . Indeed, with our perturbation, we 
only modify the diagonal coefficients of the covariance matrix. 

Finally, we will consider the following estimator, for any Y G Bk '■= Span ((Xi) ieBK ): 
where the estimator Pq B of Pz-b k > with window K(N), is defined as follows 



Remark that f W is the Toeplitz matrix associated to the 



Po K B K 



p(JV)\ 1 f,(N) 



o, 



) 



O k B k - 



(2) 
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3 Asymptotic behavior of the empirical projection 
operator 

In this section, we give the rate of convergence of the estimator built previously (see 
Section [2]). We will bound uniformly the bias of prediction error for random variables in 
the close future. 

First, let us give some conditions on the sequence (K(N)) NeN ): 

Assumption 3.1. The sequence (K(N))n £ ^ satisfies 
• hm K(N) > +oo. 

m lim K(N)log(K(N)) jV^oo Q 

Recall that the pointwise risk in Y G L 2 (P) is defined by 



R(Y) = E 



(y-e\y\(x, 



i)i<0\ 



The global risk for the window K(N) is defined by taking the supremum of the point- 
wise risk over all random variables Y G Br = Span {{Xi) i£ B K ) 



Kk(n) (P£ kBk ) = sup R(Pq (Y)) 



YeB K , 
Var(y)<l 



Notice that we could have chosen to evaluate the prediction quality only on X . Nev- 
ertheless the rate of convergence is not modified if we evaluate the prediction quality for 
all random variables from the close future. Indeed, the major part of the observations 
will be used for the estimation, and the conditional expectation is taken only on the most 
K(N) recent observations. Our result will be then quite stronger than if we had dealt 
only with prediction of X . 

To get a control on the bias of the prediction, we need some regularity assumption. 
We consider Sobolev's type regularity by setting 



Vs > 1,W. 

and define 



g G h 2 ([0,2n)), g(t) = £ a k e m , ^ k 2s a\ < oo 1 . 

fcez fcez J 



Vg e W s , g{t) = a k e lkt \\g\\ Wa := inf J M, k 2s a\ < M \ . 

fcez I fcez J 

Assumption 3.2. There exists s > 1 such that f* G W s . 

We can now state our results. The following lemmas may be used in other frameworks 
than the blind problem. More precisely, if the blind prediction problem is very specific, 
the control of the loss between prediction with finite and infinite past is more classical, 
and the following lemmas may be applied for that kind of questions. The case where 
independent samples are available may also be tackled with the last estimators, using 
rates of convergences given in operator norms. 

The bias is given by the following lemma 
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Lemma 3.3. For N large enough, the following upper bound holds, 



\\Po K B k -Pz-b k \\ t < C 2 2s - T, 

K{N) 2 



where C2 



m'(l + ^). 

w 2s v m! 



In the last lemma, we assume regularity in terms of Sobolev's classes. Nevertheless, the 
proof may be written with some other kind of regularity. The proof is given in appendix, 
and is essentially based on Proposition 14.11 This last proposition provides the Schur block 
inversion of the projection operator. 

The control for the variance is given in the following lemma: 



Lemma 3.4. 



f p (iwu - *>.,. if >*)*< ctK ( m*sf®r + ww***™ 



2\ 



N 



where C = 4m' + - + 2) 



Again, we choose this concentration formulation to deal with the dependency of the 
blind prediction problem, but this result gives immediately a control of the variance of 
the estimator whenever independent samples are observed (one for the estimation, and 
another one for the prediction). 

The proof of this lemma is given in Section I5T31 It is based on a concentration inequality 
of the estimators fp N ^ (see Comte [TU]). 

Integrating this rate of convergence over the blind data, we get our main theorem. 

Theorem 3.5. Under Assumptions IJ.il and YS. 6 ^ for N large enough, the empirical 
estimator satisfies 

1 ° kBkI - Vn k(n) 2 *^ 

where C\ and C2 are given in Appendix. 

Again, the proof of this result is given in Section 15.21 It is quite technical. The main 
difficulty is induced by the blindness. Indeed, in this step, we have to deal with the 
dependency between the data and the empirical projector. 

Obviously, the best rate of convergence is obtained by balancing the variance and the 
bias and finding the best window K(N). Indeed, the variance increases with K(N) while 
the bias decreases. Define P± N ^ as the projector P^tl N \ associated to the sequence K*(N) 
that minimizes the bound in the last theorem. We get: 



Corollary 3.6 (Rate of convergence of the prediction estimator). Under Assumptions 



1.1 and 2.1, for N large enough and choosing K(N) 



TZ(P^) < O I I ^fl ) I . (3) 




, we get 
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Notice that, in real life issues, it would be more natural to balance the risk given in 
Theorem 13. 5[ with the macroscopic term of variance given by 

E[Y -E[Y\(X t ) l<0 }}. 

This leads to a much greater K(N). Nevertheless, Corollary 13.61 has a theoretical interest. 
Indeed, it recovers the classical semi-parametric rate of convergence, and provides a way to 
get away from dependency. Notice that, the estimation rate increases with the regularity 
s of the spectral density /*. More precisely, if s — > oo, we obtain (^)i This is, up 
to the log-term, the optimal speed. As a matter of fact, in this case, estimating the first 
coefficients of the covariance matrix is enough. Hence, the bias is very small. Proving a 
lower bound on the mean error (that could lead to a minimax result), is a difficult task, 
since the tools used to design the estimator are far from the usual estimation methods. 



4 Projection onto finite observations with known co- 
variance 

We aim at providing an exact expression for the projection operator. For this, we gener- 
alize the expression given by Bondon ([5J, [7]) for a projector onto infinite past. Recall 
that, for any A C Z, and if A c denotes the complement of A in Z, the projector pa may 
be written blockwise (see for instance [12] ) as: 



PA 



Id A 








Denote also A := T" 1 = T(±) the inverse of the covariance operator, the following 
proposition provides an alternative expression of any projection operators. 



Proposition 4.1. One has 



Pa 



Ma 




-A AA cA A c 



Furthermore, the prediction error verifies 

E [{P A Y - Yf] 

where Y = $(it) = u T X. 



u T Aju, 



The proof of this proposition is given in Appendix. We point out that this proposition is 
helpful for the computation of the bias. Indeed, it gives a way to calculate the norm of 
the difference between two inverses operators. 



5 Appendix 

5.1 Proof of Proposition 14.11 

Proof. For the proof of Proposition 14.11 let us choose 

Ac Z, 



9 



and denote the complement of A in Z by 

M := A 



First of all, A = T" 1 is a Toeplitz operator over H with eigenvalues in [^-; ^]. Am 
may be inverted as a principal minor of A. Let us define the Schur complement of A on 
sequences with support in M : S = A a — Aam^m^ma- The next lemma provides an 
expression of T^ 1 (see for instance |14j). 

Lemma 5.1. 



1 A 



Proof, of Lemma 15.11 
One can check 



Aa Aam 
Am a Am 

_1 — AamA^Ama*? -1 



-Am'A 



s- 1 

MA>- 



s 

A. 



A am Am Ama- 



S^ 1 A 



-i 

M 



-S-'AamA 



-i 

M 



AaS ~ — ivam^m-l 
Ama'S'" 1 — AmA m Ama^ 



— *J ^AM^M 

A M Ama5 ,_1 AamA^- 

-AaS^Aam^m + a am(A 
■ A M (A 



M 



Ama 

'/a 
/m 



— AmaS 1 AamA m j . 

S'S' -1 (AamA m Ama'S' -1 + /a — Aa5' _1 )AamA m 1 

S^ 1 — AmaS" 1 —A M aS~ 1 AamA m 1 + I M + Ama>5~ 1 AamA m 1 _ 



Am A M aS x AamA m ) 
A^ 1 Am aS~ 1 AamA m 1 ) 



Since the matrix are symmetric, we can transpose the last equality. We obtain that 



s- 1 

—A m 1 AmaS~ 



Am 



-S 1 AamA m 1 



A m 1 Ama>5 1 AamA m 1 



A- 1 

r. 



So that Fa = S 1 . 

We now compute the projection operator: 



□ 



Pa 



id A r:T 











A 1 AM 







7dA sr 




AM 







J^A (Aa - A am A A / A m A ) r 



Ma AaTam 



AM 







AamA m 1 (/o?m — AmLm; 




Ma AaTam — AamA m + Aam^m 


Ma — AamA m x 




Where we have used Ar = M in the last two lines. 
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Now consider Q the quadratic error operator. It is defined as 

V-u G l 2 (Z),u T Qu := ||(p A n-n) 2 || r = E[(<l>(n)-F4<l>(w)) 2 ] . 

This operator Q can be obtained by a direct computation (writing the product right 
above), but it is easier to use the expression of the variance of a projector in the Gaussian 
case given for instance by \12\ . 

Q = — Fma^j^Tam 

Again, notice that Q is the Schur complement of T on sequences with support in A, and 
thanks to Lemma I5TT1 applied to A instead of T, we get 

Q = Am- 

This ends the proof of Proposition 14.11 □ 



5.2 Proof of Theorem 13^ 

Proof, of Theorem 13.51 



Recall that we aim at providing a bound on yT^{Po^B K ) 
Notice first that we have 



nP& K )<J sup E (P£> {Y) - P 0kBk {Y)Y + y/H(P OK B K )). 



Var(Y)<l 

Using Lemma [3~31 for a sequence (K(N)) NeN and a centered random variable Y G 
Span ((Xi)i e B K ) such that E [Y 2 ] = 1, we have 

\/K{Po k b k )) < \\po K B K — PZ~B K 

^ C2 — — 2s-l ■ 

K(N) — 

For the variance, we first notice that Y = = w T X, 

K(N)-1 

1 = E \Y 2 ~\ = u T T b k u > mu T u = m u 2 , 

Denote A = Po B — Po k b k - We can write, by applying twice Cauchy-Schwarz's 
inequality, 

-1 K(N)-1 

E 



(p^ K F- p o^r) 2 = /( E E 

J ./w i=-K(N) 3=0 

-1 K(N)-1 -1 

^ / E ( E ^>H) 2 E *» dp M 

i=-K(N) j=0 i=-K(N) 
-1 K(N)-1 K(N)-1 -1 

^ / E E E E 



i=-K(N) 3=0 j=0 i=-K(N) 
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So that, 

-1 K{N)-1 K(N)+n 
Jul i=-K(N) j=0 i=n +l 

Using the following equivalence between two norms for finite matrices with size (n, m) 
(see for instance [TTj). 



we obtain 



2 



\ 



E 



P<DkB k y - Pq k b k y 



< 



K(N) 



m 



K(N)+n 

ii^HC E x > 

i=no+l 



Further, 



E 



^O k B k Y ~ ^O k B k * 



< 



< 



< 



K(N) 



m 



K(N)+n 
i=no+l 



K(N) 



m 



P( w )n^dP( w 



\ 



/K(N)+n \ 2 

/ E X IH dp H 

\j=n +l / 



if(iV) 



??2 



We have used here again Cauchy-Schwarz's inequality and the fact that, for all nonnegative 
random variable Y, 

E [Y] = I P (Y > t) dt. 
Jr+ 

Since X is Gaussian, its moment of order four r 4 is finite. Then Lemma 13.41 yields 
that, for N large enough, 



E 



Pq N Jb k y - Po k b k Y 



< 



C 2 ^nK(Nyiog(K(N))) 



mN 



So that, 



sup E 

Y€B K , 
Var(y)<l 



(P&Xy) - Po K (Y)¥ 



< 



C^AOVM/^AO) 



N 



with C\ = . This ends the proof of the theorem. 



□ 
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5.3 Proofs of concentration and regularity lemmas 

First, we compute the bias and prove Lemma [8.81 : 



Proof, of Lemma 13.31 

Recall that we aim to obtain a bound on \\po K B K — Pi-b k \\ v - Using Proposition I4.1[ 
we can write 



\\Po k b k ~ Pz-b k \\ t < \\Po K z+ -Z>z-z+|| r 

(Tok)~ 1 ^O k Z+ 



< 



-A OK z+(Az+r r 

■ A M-Z+( A Z+) _1 



So that, using the norms equivalence 
\\Po K B K -Pz-B K \\ r < 



m 
m 







-A GkZ +(A z +) 1 
-Aa/-z + (Az+)~ 1 



2,op 



< 



m 
m 



< — 



m 
m 



(To K ) r 0jr z+ + A Gk z+(Az+) 1 

A A/ K Z+( A Z+) _1 

rOjfZ+Az+ + A-o K z 



>K 

A 



2,op 

((Az+r 1 ! 



2, op 



2,op 



^ ~ll( A z+) ^L™ ( II ( r Oir) 1 r OK z+Az+ + a 0kZ +|| + 



/?7 



l2,op 



2,op 



A 



M"Z+ 



2,op 



The last step follows from the inequality: 



.4 




A 







B 


< 




+ 

2,op 


2, op 





B 



2, op 



But, since A = Y 1 , 
So, we obtain, 



< 



< 



< 



Ok2 



77? 



77? 



77? 



/?? 



m 



???. 



77? 



+ r „A 



Ok 1v O k 2 



\A\\ 2 ,op + ll-^ll2,op • 



-Fq k M-^M-2 



I (A 



|(A 2 



|(A 2 



l2,op 



\2,op 



But, we have, 



< (Az+) 2 op 

m a z,op 



|(A Z 



( r o?{! 

( r Ojf! 



l2,op 



\2,op 



\2,op 



O k M-^M-1+ 



-r a 

1 O k M- ii M-2 



2,op 



+ 



A 



M"Z+ 



2,op 



2, op 



+ 



A 



M"Z+ 



2,op 



O k M~ 



2, op 



Am~z+ 



2,op 



+ 



Am~z+ 



2,op 



+ 1 



A 



M"Z+ 



2,op 



" X |L <m', 

II 2, op — ' 



as the inverse of a principal minor of A. 



rwr 1 !!,, <- 



2,op 



77? 
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since it is the inverse of a principal minor of T. 



o K M~ 



2,op 



< m , 



as an extracted operator of T. 
Thus, we get 

\\po k b k -Pz-B K \\ T < C 4 



2,op 



where C4 = ^r-(l + — )• Since /* £ -£f s (Assumption 2.1), and /* > m > 0, we have also 
-p G il s . If we denote p(/c) = A i>i+ fc the Fourier coefficient of i, we get 



A 



2,op 



< 



< 



< 



< 



< 



A M"Z+|| 2 



i<-K(N);0<j 



\ 



00 00 



E E^') 2 

i=K(N) j=i 



N 



i=K(N) 



r 



;2s 



1 



So that the lemma is proved and the bias is given by 



□ 



bo* 



< C 4 



1 



1 



Actually, the rate of convergence for the bias is given by the regularity of the spectral 
density, since it depends on the coefficients far away from the principal diagonal. 



Now, we prove Lemma 13. 4[ which achieves the proof of the theorem. 
Proof, of Lemma 13.41 



Recall that A = Pol B ~ Vo k b k ■ We aim at proving that 



P(||A||J >t) dt<C*K(NY 



' log{K{N ^r+o(K(N)x^:r jJ n 



.log(K(N)). 



N 



N 



First, 



\A\ 



2,op 



(To K ) 1 ^o k b k - (To K ) To K B f 



2,op 



< \\To k b k \\ 2 , op 



(f o K Y l - (r G J 



-1 



< \\^o k b k \\. 



,op 



(f 



O k , 



2,op 



2,op 



(foj 



-1 



2,op 



^O k B k - ^0 K B h 



2,op 



2,op 



To K - r OA . 



2,op 



+ 



r 



2,op 



O k B k - ^O k B k 



2,op 
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But, we have, 



as an extracted operator of T. 



\To K B K \\ 2 ,op - m 'i 



< — . 



Ok I 1 1 9 „„ — i 



as the inverse of a principal minor of T. 

(fo,.)" 1 

thanks to the regularization. Furthermore, 



< -, 

2,o P m 



^O k - r O/. 



<K(N) sup {\r N (p) -r(p)\} + \a\ . 

2 :°P p<2K(N) 



So the regularization also gives 



^o k b k - ^o k b k 



<K{N)\ sup {\r N {p)-r{p)\}\ . 

2 ,°P \ p<2K{N) 



111111 fK ^min /£<0 + "J^min /£<f 



So, 



|a| < (2K(N) + 1) sup {rv(p)-r(p)} + -l min/; v <2f . 

p<2K{N) 4 



For the last inequality, we used the following lemma, proved in the next section. 
Lemma 5.2. The empirical spectral density is such that, for N large enough 



fN _ f-* 
J K{N) J 



m 



<(2K(N) + 1) sup {r N (p)-r(p)} + —. 



p<2K(N) 



This implies 



min/£l 



min/«<0 



<(2K(N) + 1) sup {f N (p) -r(p)}. 

p<2K{N) 



So, we obtain, 



4m' f \ 4 / 

2op < —\K(N) sup {\f N (p)-r(p)\} + \a\)+-K(N)[ sup {^(p) - r(p)\} 

my p<2K(N) J TU \p<2K(N) 



/6m' 4 
I -^ + - + 2 

m z m 



777/ 

K(N)\ sup {\r N (p)-r(p)\}\+—l miQfl<T 

K {^)J \p<2K(N) I m 



We will use here some other technical lemmas. Their proofs are also postponed to the 
last section. The first one gives an uniform concentration result on the estimator f^ip)'- 
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Lemma 5.3. Assume that Assumption \3.1\ holds. Then, there exists N such that, for all 
N > N , and x>0, 



V P < 2K(N). MP) " r(p)\ > W UMEm±S + A . 

with probability at least 1 — e~ x 

For ease of notations, we set Co = 4m' + ^ + 2) and C3 = For the compu- 
tation of the mean, the interval [0, +00 [ will be divided into three parts, where only the 
first contribution is significant, thanks to the exponential concentration. We will prove 
that the two other parts are negligible. 

We obtain, for all x > 



\og(K(N) + x x 



U\\ %op < (Co + o(i))K(N) 1 \/ ^ + - ] + c 3 i min/i v< 

with probability at least 1 — e~ x 

Set ^(coKWy/^^f- 

For t G [0,ti], we use the inequality 

We obtain the first contribution to the integral. This is also the non negligible part. 



ti 

Pfll^llo™ >t]dt 



Now, set t 2 = (c K(N)y/^P±^+C^ * 
For t G [ii,^], we use 



> -P (CSJTW* ( '° S(g( ; ))+I ) 2 ,Co 4 ^ (i) 4 )) < .-+F (min/« < 



Notice that the last lemma provides 



m 



2K(N) sup {\f N (p) -r(p)|} > — < e WW). 



p<2K(N) 2 



Indeed 



m') 2 ' 
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One can compute that with probability at least 1 — e X °( N \ 



log(K(N)) + x (N) xo(N) 
sup {\r N (p) - r(p)\} < 4m \\ h 



p<2K(N) 



N N I 

, ( llog(K(N)) m 2 m 2 



N (MK(N)m') 2 (MK(N)m 



l\2 



jv V (64A"(aO"i') 2 (64^(^)771'; 



2 



iV ^i^AQm') (64K(A^)m') 2 



777 

< 



8 J fY(A^) : 
for N large enough. Hence, 



~!\t 777 \ m 



So, we have 



P (PIC > max fanrW ( ■""f 1 )', »W (^) 4 )) < e- + e"<^F. 

Finally, the following lemma (the proof is again postponed in Appendix) will be useful to 
transform a probability inequality into an L 2 inequality. 

Lemma 5.4. Let X be a nonnegative random variable such that there exists two one to one 
maps f\ and fi and a C > with 

Vx > 0,P(X > sup(/i(s),/2(x))) < e~ x + C, 

then 

F(X >t)< e _/ i _1 W + e-^" 1 ^ + C. 
So, thanks to lemma I5T41 we have 



\A\\i or) >t)<e y°^ N)i +e V c ^+ e "(^)wF. 

Now, we will prove that each term can be neglected. Integrating by part, we obtain 



vv^ +los( * (jv)) d t < r e ~ N y^ +losiK{N \t 

h Jtx 



< 



2VtC 2 K(N) 2 - N ^^^W) 



-e 

N 



00 C 2 K{N) 2 c -^/^F +log( ^ (jV)) dt 
NVi 

2\o E {K(N))C%K(Ny 2C*K{Nf 
N 2 + N 2 

4 N 
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Then, 



So that, 



Leading to 



e V^^dt < t 2 e 

ti 



< t 2 e 

\og{K{N))\ 



o C K(N) 



N I 



f t2 run iVm 2 

/ e - X 0( N >dt < t 2 e (64K(iV)m') 2 

Ju 



o\\C K ( N)J^ K ^ 



N 



j\(\\A\\% op >t) dt = o(^C K(N) 



log(K(N)) 
N 



Finally, for t 6 [*2 , +00 [, we use 



K, > M I [wcwj 1 **™ + * + c, j ,(cwr W i + ft) 4 | ] <.- 

Thanks to lemma 15^1 we get 
So, integrating by part once more, we obtain 



-00 

e 



n*a) 2+ '° 8<A ' <jv,, d( < r 4 (li+ c 3 )v"(wT)' + '^" v »d„ 

t 2 ■/ -C 3 

+OO 



< 

< Pi(u,iV,K(Af))e^ 7V 



-I ^-c 3 



Here, Pi (it, N, K(N)) is a polynomial of degree 3 in u and is rational function in N and A'(n) 
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Furthermore, 



+00 (frt-g 3 ) r+00 



e <W0 ^ < / 4(u + C 3 ) J e c o*w 

+00 



P 2 (u,N,K(N))e °oA'(iv) 



< 

< P 2 (u, AT, if(JV))e-V^O°g(^W)+^) 
< 



where P^iu, N, K(N)) is a polynomial of degree 3 in u and is rational function in N and K(n) 
We proved here 



J™v(\\A\\l ap >t) <C*K{N)\ 



4 .log(K(W)). 2 ( .^. AM Aog(K(N)). 2 



o(K(Nn- 



N ' v v ' v jV 
This ends the proof. □ 



5.4 Technical lemmas 

We prove now the technical lemmas: 



Proof, of Lemma 15.31 

Notice that f^(p) = X T T N (g p )X with p (t) = J^cos(pt). We use the following 
proposition from Comte [10]. Let Xi, • • • , X n be a centered Gaussian stationary sequence 
and g a bounded function such that T n (g) is a symmetric non negative matrix. Then the 
following concentration inequality holds for Z n (g) = ^ (X T T n (g)X — K[X T T n (g)X}): 



-nx 



F (Z n (g)> 2 ||/L(N| 2 V^+ INL *)) <e 
By applying this result respectively with g p and — g p and we obtain 



p ( \rW(p)-r(p)\ > 2ra'-^(v^ + x) ) < 2e~ Nx . 
1 N — p 



or, equivalent ly, 



I *(JV) / x M i . 9 , ^ x + log(/,(AQ) + 21og(2) s + log(ir(JV)) + 21og(2) , 

with probability lower than 2 k(n) • ^ taking an equivalent, we obtain that there exists 
jV such that, for all N > N , for all p < 2K(N) 



N N I ~ 2K(N) ' 



□ 



Proof, of Lemma 15.41 

We set t = sup(/i(x), feix)) If t — fi(x) then 



P (X > t) < e"^ 1(<) + C < e~ f ^ + e-^W + C. 
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Symmetrically, if t = /2(a?) we have 

P {X > t) < e~^ 1(i) + e-^ 1{t) + C. 

□ 

Proof, of Lemma 15.21 It is sufficient to ensure that the bias is small enough. Choose N 
such that 

Tfi 

2\\n\ Hs K(Nr +i <-. 

Then we use 

K(N) 



fN f+ 
JK(N) J 



< Yl \r N (p)-r(p)\+2 \<P)\ 

p=-K(N) p>K(N) 

< (2K(N) + 1) sup {r N (p)-r(p)} + 2\\r\\ Hs K(Nr s+1 

p<2K(N) 

Tfi 

< (2K(N) + 1) sup {f N {p)-r{p)} + -. 

p<2K(N) 4 

This ends the proof of the last lemma. □ 
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